Provable Super-Convergence With a Large Cyclical Learning Rate
نویسندگان
چکیده
Conventional wisdom dictates that learning rate should be in the stable regime so gradient-based algorithms don't blow up. This letter introduces a simple scenario where an unstably large scheme leads to super fast convergence, with convergence depending only logarithmically on condition number of problem. Our uses Cyclical Learning Rate we periodically take one unstable step and several small steps compensate for instability. These findings also help explain empirical observations [Smith Topin, 2019] they show CLR maximum can dramatically accelerate lead so-called “super-convergence”. We prove our excels problems Hessian exhibits bimodal spectrum eigenvalues grouped into two clusters (small large). The is key enabling over eigen-spectrum.
منابع مشابه
A Randomized Asynchronous Linear Solver with Provable Convergence Rate
Asynchronous methods for solving systems of linear equations have been researched since Chazan and Miranker published their pioneering paper on chaotic relaxation in 1969. The underlying idea of asynchronous methods is to avoid processor idle time by allowing the processors to continue to work and make progress even if not all progress made by other processors has been communicated to them. His...
متن کاملConvergence of Gradient Dynamics with a Variable Learning Rate
As multiagent environments become more prevalent we need to understand how this changes the agent-based paradigm. One aspect that is heavily affected by the presence of multiple agents is learning. Traditional learning algorithms have core assumptions, such as Markovian transitions, which are violated in these environments. Yet, understanding the behavior of learning algorithms in these domains...
متن کاملSuper-convergence: Very Fast Training of Residual Networks Using Large Learning Rates
In this paper, we show a phenomenon, which we named “super-convergence”, where residual networks can be trained using an order of magnitude fewer iterations than is used with standard training methods. One of the key elements of superconvergence is training with cyclical learning rates and a large maximum learning rate. Furthermore, we present evidence that training with large learning rates im...
متن کاملSuper-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates
In this paper, we show a phenomenon where residual networks can be trained using an order of magnitude fewer iterations than is used with standard training methods, which we named “superconvergence.” One of the key elements of super-convergence is training with cyclical learning rates and a large maximum learning rate. Furthermore, we present evidence that training with large learning rates imp...
متن کاملA Collocation Method for Integral Equations with Super-Algebraic Convergence Rate
We consider biperiodic integral equations of the second kind with weakly singular kernels such as they arise in boundary integral equation methods. The equations are solved numerically using a collocation scheme based on trigonometric polynomials. The weak singularity is removed by a local change to polar coordinates. The resulting operators have smooth kernels and are discretized using the ten...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Signal Processing Letters
سال: 2021
ISSN: ['1558-2361', '1070-9908']
DOI: https://doi.org/10.1109/lsp.2021.3101131